A Frequency Warping Approach To Speaker Normalization - Speech and Audio Processing, IEEE Transactions on
نویسنده
چکیده
In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated. A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis. An experimental study comparing these techniques to other wellknown techniques for reducing variability is described. The results have shown that frequency warping is consistently able to reduce word error rate by 20% even for very short utterances.
منابع مشابه
A frequency warping approach to speaker normalization
In an effort to reduce the degradation in speech recognition performance caused by variations in vocal tract shape among speakers, this thesis studies a set of lowcomplexity, maximum likelihood based speaker normalization procedures. By approximately modeling the vocal tract as a simple acoustic tube, these procedures compensate for the effects of the variations in vocal tract length by linearl...
متن کاملتخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملRobust recognition of children's speech
Developmental changes in speech production introduce age-dependent spectral and temporal variability in the speech signal produced by children. Such variabilities pose challenges for robust automatic recognition of children’s speech. Through an analysis of age-related acoustic characteristics of children’s speech in the context of automatic speech recognition (ASR), effects such as frequency sc...
متن کاملPitch synchronized speech processing (PSSP) for speaker recognition
A method for speech signal enhancement is developed with application to automatic speaker recognition where the signals have different channel conditions. The basis of this technique is a robust pitch detection algorithm that accurately estimates the instantaneous pitch rate, and extracts single pitch period speech segments. This technique of pitch synchronized speech processing (PSSP) provides...
متن کاملSpeaker normalization through formant-based warping of the frequency scale
Speaker-dependent automatic speech recognition systems are known to outperform speaker-independent systems when enough training data are available to model acoustical variability among speakers. Speaker normalization techniques modify the spectral representation of incoming speech waveforms in an attempt to reduce variability between speakers. Recent successful speaker normalization algorithms ...
متن کامل